A generalization of the Kullback-Leibler divergence and its properties 
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A generalized Kullback-Leibler relative entropy is introduced starting with the symmetric Jackson 
derivative of the generalized overlap between two probability distributions. The generalization 
retains much of the structure possessed by the original formulation. We present the fundamental 
properties including positivity, metricity, concavity, bounds and stability. In addition, a connection 
to shift information and behavior under Liouville dynamics are discussed. 
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I. INTRODUCTION 



The relative entropy or the information divergence is a measure of the extent to which the assumed probability 
distribution deviates from the true one. We need a means of comparing two different probability distributions 
and will define a distance as a fundamental quantity that discriminates the distributions. The form of the relative 
entropy, which was first introduced by Kullback and Leibler [l| is the most pervasive measure in information theory 
Q and statistical mechanics Q. Its most prominent property lies in its asymmetry between two distributions (i.e., 
under interchange of the two) and that it does not satisfy the triangle inequality Wl Recently, the parametrized 
entropy has gained a great deal of attention in physics and information literature ;5j, in an effort to gain deeper 
understanding of the structure of equilibrium statistical mechanics and improved perspective of information theory. 
Due to the close connection to entropy and relative entropy, a generalized Kullback-Leibler (KL) relative entropy 
was presented, whose form is in conformity with a generalized entropy [H, [?J • I n terms of the information gain, the 
generalization of the KL relative entropy has led to an adoption of a generalized information content. 

^ ' The purpose of this paper is to introduce another extended KL divergence and to investigate its fundamental 
properties. Our construction of the new divergence measure is clear and origin of the form of the already existing 
generalization 0, @] can be explained, once we realize that a seed quantity is a generalized overlap of the two 
distributions. The fundamental properties of the newly introduced generalization of KL that we treat in this paper 
include positivity, metricity form invariance, concavity, upper and lower bounds and stability. In addition, the 
quantum no-cloning theorem [8| has recently been shown to possess a classical counterpart, where universal perfect 
00 ■ cloning machines are incompatible with the conservation of distance measure under the Liouville dynamics governing 
' the evolution of the statistical ensemble. This fact was first shown through the ordinary KL divergence 0] and was 
. afterward extended to the Csiszar f-divergence [To| . Furthermore it was shown that this fact also applies to a non 
f-divergence type pT| . As a particular instance of the f-divergence, we shall specifically show that our generalized 
OS . measure also exhibits constancy under this linear evolution dynamics. 

o ■ 

The organization of the paper is as follows. In section II, we first recapitulate the basic properties of two different 
quantities: a distance in terms of KL, and the overlap of the distributions. We examine a specific stability property 
in order to clarify the difference between KL and the overlap. This property will be investigated for our measure in 
the subsequent section. In section III, we present our generalization by way of the asymmetric Jackson derivative. 
Some basic properties are addressed in section IV. We summarize our results in the final section. 



II. DISTANCE AND OVERLAP 



The discrimination between two different probability functions is important in physics and information theory. To 
gain more insights into how we can measure the difference, we first consider the relationship between the KL distance 
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and the overlap, which is also called the fidelity. Suppose that n statistically independent subsystems constitute 
a system so that the joint probability distribution can be written in a factorized form V m — Vm ■ ■ -Vm\ where 
m = 1,2. The KL distance K(J , i,T > 2} between Vi and V% defined on continuous support with dx = dx^ ■ ■ -dx^"' is 

n 

= £JC(PP, (f) 

On the other hand, the overlap 0(Vi,V2) between V\ and T-^, is comprised of the overlaps of the subsystems and is 
expressed as 

n 

= l[0(V^,V^). (2) 

i=i 

The KL distance is a sum of distances of the independent component systems (decomposability property), while 
the overlap for the total system is constructed from a product of the overlaps of subsystems. We note that as a 
closely related measure to the overlap, the statistical distance has been introduced by [l2j], and is based on the 
number of distinguishable states between two probabilities and can be given as an inverse cosine of the overlap as 
cos -1 0(Vi, V-i). We used the continuous form in the above, however subtleties exist between the continuous and 
discrete form of entropies as clearly stated in [13[ . We shall use both forms depending on the ease of presentation in 
the rest of this paper. 



A. Stability 

Stability in general has a quite broad meaning and has various definitions. The stability depends on the degrees of 
responses to the external perturbation. We shall now consider a situation where an external environment disturbs a 
system described by a set of probabilities. As a consequence of the disturbance, only a specific state of the system 
may slightly change the probability, say, by a factor e. Alternatively, we could also describe our set up as follows. 
The fluctuation of the target system is so small that its influence on the probability states could be limited and it 
appears only between two states. Due to the normalization of the entire probability, if the probability of a certain 
state is altered, then another state is also changed. This may be a matter of time scales inherent to the system. 
Although the fluctuation may initially occur locally in states space, it propagates in the neighboring states, and the 
reconfiguration of the probability distribution of the system occurs immediately towards a (quasi) equilibrium or 
static states of the system. Instead of considering the long-term stability, we limit our concern to a very early stage 
of the response. The long term dynamical stability requires the introduction of an underlying physics and is out of 
scope of the present treatment. 

We define {p(xi,e)} as the distributions after an infinitesimal change denoted by a factor e, which is assumed 
to be close to unity. The evolution of the system is attributed to the change of the probabilities in time. Hence, 
two distributions {p(xi)} and {p(yi)} are assumed to be connected by p[yi) — Ysk=i P(yi\ x k)p( x k) > i-e., the linear 
transformation of one into another state (l4| . We may also regard this in the context of information theory as a 
transmission of input states p(xj)'s under the channel matrix p(yi\xk) to obtain output states p(yi). Then, the output 
states that received the disturbance are expressed as p{yi\ e) — 53fc=i P(yi\ x k)p(%k', e)- Without loss of generality, we 
assume that the fluctuation affects only two states (Zth and mth) in such a way that [l5| 

{ep(xi) for k = I 
(1 - e)p(xi) + p{x m ) for k — in . (3) 
p(xk) for others 

This appears to correspond to the situation that a certain external fluctuation boosts the visiting frequency of a 
particular state. From the above, we have p(yi] e) = p(yi)+Ci(e— l)p(xi), where we have set Ci = p(yi | xi)—p(yi \ x m ). 
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Let us put e — 1 — £. We then have 

n 



l+£c 



p0eo 
p(y<) 



(4) 



Since we are considering the case £ <C 1, then expanding the logarithm and considering above up to the second order 
in £, we have 



£({p(w;0},{p(ifc)}) = (E c <)p(^+ E 



p 2 (^) 



-J p(z/») 



e 2 + o(f). 



(5) 



Therefore, we have always the positive second derivative d 2 JC/d^ 2 > 0, which means that the distance is stable under 
this disturbance. On the other hand, the overlap between p(yji\ e) and p{yi) are calculated to be 

n 

o({p(w;0}.{p(w)» = EvM^Opfe) 



E^H/ 1 +&i 



E^ 



1 



1 + 7T c i 



2 p(i/i) 8 ' \p(yi) 



P(xi) 



(6) 



where we have approximated the last line using y/1 + at; ~ 1 + et£/2 — a 2 £ 2 /8 H . Therefore, the second derivative 

d 2 0/d^ 2 — —p(xi) 2 (J2i c i/p(yi))/^ < 0) which implies instability in this framework. Note that for two identical 
distributions, the KL distance vanishes while the overlap becomes unity. Therefore, the impact of the fluctuating 
effect on the two distance measures appears in the coefficients of t; n (n > 1). 



III. A GENERALIZED KULLBACK-LEIBLER ENTROPY 



The relative entropy can be arbitrarily defined and therefore it is possible to introduce alternative definitions to 
the conventional KL if needed. Some classes are actually discussed in Although the extensions of the usual KL 
entropy were already proposed by several authors in different forms, their presentation are somewhat heuristic and 
the mathematical origins are not fully clear. In this section, we consider a generalization of the KL entropy in light 
of the Jackson derivative [l7| and illustrate some of its properties in the next section. The Jackson derivative has its 
root in quantum group theory and has already been used to produce the Tsallis type generalized entropy [Tq |. The 
Jackson derivative of a function f(x) is defined for s ^ 1 by 



/(«) 



f(sa) - f(a) 



(7) 



The case s — 1 corresponds to the ordinary derivative. The generalized entropy of Tsallis Q is obtained when the 
derivative is operated to a quantity Z(a) = J^iP" an d evaluated at a = 1 [l8j ]. i.e., 



a s a 



i-EiPl 



(8) 



a=l 



Keep in mind that if we operate the derivative d/d s a on another quantity and evaluate it at different values of 
a, we can in principle obtain different types of generalized entropies. In this scheme, we shall employ a quantity 
Z'(a) = y~! - pf q}~ a for obtaining a new class of generalized KL entropy. This quantity was called the Renyi overlap 
of order a [Taj because a quantity lnZ(a)/(a — 1) is defined by Renyi [2(|[2lj]. We note that the usual KL entropy is 
obtained by the ordinary derivative of Z'(a) when evaluated either at a = 1, 



or at a — 0, 



dZ'(a) 
d\a 



dZ'(a) 
d\a 



(9) 



(10) 
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By the same token, the generalized KL entropy introduced previously in [1,0] is generated by the operation of d/d s a 
to Z' {pi) and substituting a = 1, 

MP,© (ii) 

a s a a=1 s - 1 

These facts provide an indication that we can produce various kinds of generalized KL entropies by evaluating 
the Jackson derivative of Z'(a) at different values of a. It is also possible to take another approach to achieve a 
generalization by employing the symmetric Jackson derivative defined for a function g(a) with s ^ 1 



g(sa) — g(s a) 



(12) 



(s — s 1 )a 

This derivative is symmetric under the interchange of s <-> s~~ . We operate the symmetric Jackson derivative on 
Z'(a) and evaluate it at a = 1, 



a=l 
1-s 



(13) 



We note that this generalized KL entropy is asymmetric C S (V, Q) ^ £ S (Q, V) and the relation C s -i(V, Q) = CsiV, Q) 
holds. A symmetric quantity can be constructed by adding C S {V, Q) and C S {Q,V). In the limit s — ► 1, C S {V, Q) 
reduces to the usual KL entropy JC(V, Q), which can be easily checked by the L'Hospital theorem. This divergence 
is well-defined whenever the two distributions have common support (state number i's). In other words, in order to 
have a finite value as a distance measure in the case < s < 1, the probability pi must vanish when qi vanishes, and 
similar restrictions also apply for s > 1 and in the limit s —> 1. The decomposability property in the sense that we 
mentioned in section II is not expected for C s , since 



(s 



■vrM 1) ---n n ') = [[^-[[L^ 1 , (M) 

3=1 3=1 

It would be interesting to note that C s can be understood from the 



3=1 3=1 

where L { P = J dx.^[p[ 3) \ s [T>i j) \ 1 - s etc. 
limiting case of the weighted power mean of order A, which is defined for x, y > as 

E*[x,y] [sx x + (I - s)y x ]i , (15) 
and the particular instances E® [x, y] and E\ [x, y] correspond to the geometric mean Jxyy and to the arithmetic mean 

2 2 

(x + y)/2, respectively. Therefore we have 

E*\p h qi\ -E^lp^qi] 



£ S {V,Q) = lim 



(16) 



IV. SOME PROPERTIES OF C B {V, Q) 
A. Positive semi-definiteness 



This property corresponds to the information inequality for the standard KL entropy, i.e., JC(P,Q) ^ 0. For 



Cs^P, Q), the kernel function is f(x) = (x 1 s — x 1 s )/(s — s 1 ). The second derivative 



/ (x) 



-is(a-l)x- 1 -' + -(l--) 



(17) 



is always positive for s > 0, zero for s = and can be negative for s < 0. Therefore, due to the Jensen's inequality 
J2i otif{xi) ^ f(J2i a i x i) f° r / ( x ) % witn J2i a i = 1- b y putting Xi = qt/pt we obtain 



1 



s — s 
0. 



Dp* 



(18) 
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Note that for s / 0, the last equality holds iff pi = qi, Vi. We note that when s > 0, / is a convex function. 
Accordingly the positivity C s is found to be a direct consequence of Lemma 1.1 in [23| . where an inequality 

J E P(x)f m(dx) ^ J^p{x)m{dx) ■ f(u ) (19) 

is proved for nonnegative measurable functions on a measure space (X, X, m) for E G X (u-algebra of subsets of X) 
and uq = J E q(x)m(dx)/ J E p(x)m(dx). The normalization of the probability functions in Eq. (|19p when / satisfies 
/(l) = results in positivity for our case. 



B. Metric property 

The infinitesimal shift in probability provides 

i ^ Pl ' n=2 ' i 

where f(x) is the same as above, and we have used the facts /(l) = and J^i ^Vi — 0- Since the second derivative is 
/"(l) = [ s (s- 1) + s- 1 (l - s" 1 )]/(s- s- 1 ) and we have C s (V,V + dV) ~ 2- 1 /" (1) Eife) 2 /^ if w e introduce the 
information metric (or Fisher-Rao metric) ds 2 as 

ds 2 =^2g, j dp l dp>, (21) 

then the metric tensor is given by 

S ( S -l) + S - 1 (l-.s- 1 ) x 

9ij = 2p l { S -s-^) 5ij - (22) 

In the limit s — > 1, this metric reduces to Sij/(2pi). 



C. Form invariance 



Under transformation of coordinates 8 — > 77, the distribution p(#) may satisfy p{9)d6 = (j)(j])dr] where (^(77) is the 
converted distribution. The resulting relation p\jpi = 4>i/4>2 shows that the distance between 4>i and 4>2 measured 
by C s remains unchanged, that is, the distance is equivalent to the one between p\ and pi before the transformation, 

Cs{Vi,Vi) = — ^ f dr,M0(v)) 

C C 1 I 



= £ s ($i, $ 2 ). 



(23) 



D. Concavity 



By setting aj = a,j/J2k ak anc ^ x i = ^j/ a j f° r the Jensen's inequality J2j a jf( x j) ^ f{J2j a j x j) with the same 
f{x) as in the section HV Al we have 



1 ^ a 4te) 



(24) 



We consider two states j = 1, 2 by putting ai = Ap] and 02 = (1 — A)p 2 to obtain p = Ajj] + (1 — X)p 2 . Similarly, we 
set bi = Xpf and b 2 = (1 - A)/^ 2 for p' = Ap^ 1 + (1 - A)p^ 2 , where < A < 1. Substituting these into Eq.j24]), and 
summing over j yields 



\Csip\p' 1 ) + (1 - \)£s(p 2 ,p' 2 ) > Csixp 1 + (1 - \)p\ xp' 1 + (1 - Ay 2 ). 



(25) 



This completes the proof. 
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E. Stability 



We investigate the stability property explained in section Til Al The distance measure between the original and the 
perturbed distribution is 



1 - 

£s{{p{yi;€)},{p(yi)}) = =r z2{p(yi) + CiP(xi)Q 



p{y%) + c l p{xi)(,\ s 1 f p(yi) + Cip(xi)C 



p(Vi) 



\ PiVi) 



(26) 



Expanding (1 + t;Cip(xi)/p(yi)) s 1 etc. with respect to £ and taking terms up to second order in £, we obtain the 
expression 



u{p{yi\ m, wvi)}) = E ciMxM + i& (j2 J p 2 ^)? + °(£ 3 )> 



(27) 



where f(s) = s — s^ 1 — 1. Considering the sign of the coefficient of £ 2 , we can conclude that the C s is stable when 
s > and unstable when s < 0. Note that except for the factor f(s), the effect of perturbation on the generalized 
distance is the same as on the ordinary KL Eq.© up to second order in £. 



F. Upper and lower bounds for C S (V, Q) 



The following bounds hold. 
Theorem IV. 1. Letp(x), q(x) S X, be two probability distributions. Then we have the inequality: 



2 ^ 



g(g) 



E 



g(z) \ 1 
p(x) 



p{x) \0{ 



P(x) 
q{x)' 



(28) 



Proof. For a convex function f(u) — t 1 u (0 < t < 1), we employ the Hermite-Hadamard inequality which holds for 
convex functions, 



2 o — a 



Vj^M+M, M>0). 



Now, putting a = s and & = s 1 we have an inequality 



1 5 + S 

t 1 — < 



ll — s +1 — S 



> - s-^logi 2 



(29) 



(30) 



Since (t 1 s — t 1 s 1 )/{s — s 1 ) takes negative values Vs for t g (0. 1). if we choose t = a(x) /v(x) in equation ((5D|) and 
sum over x £ X after multiplying by p(x), we obtain 



IE 



p(x) 



l-s 



p(x) 



l-s" 



p(x)log^ < 

g(x) > - > 



l-s 



E 



gj>) 
p(ar) 



PW/ \p( x ) 
p(x) 



l-s" 



p(x) log 



q{x)' 



The equality holds if and only if s = s , i.e., s = ±1, which completes the desired bounds for C s 
As for an upper bound for C s , the following expression also holds. 



(31) 
□ 
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Corollary IV. 1. Letp(x), q(x) be as above. For — 1 ^ s ^ and s > 1, we have 



< C (V Q) < 1 ^ ' eXp ^ S ^ ql "( x ^ ~ exp ^ S W 



x£X \/exp[p s {x)q 1 s {x)+p s 1 {x)q 1 s 1 (x)} 

Proof. For a ^ and b ^ 0, the geometric mean is smaller than or equal to the logarithmic mean, i.e. 

b — a 



(32) 



ab < 



In b — In a 



(33) 



Equivalently the inequality | In 6 — lna| ^ \b — a\/\/ab holds for the equality iff a = b. Therefore, setting b 
exp[p s (x)q 1 ~ s (x)] and a — exp[p 8 (x)q 1 ~ s (x)], we have 



p s (x)q 1 ~ s (x)-p s (x^ (x) 



\ e p s (x)q 1 s (x)_ e p s (x)q 1 3 (x)| 



Tjexplps^q 1 ' 3 ^) +p s 1 (x)q 1 ~ s 1 (x)] 
From the positivity property proved in section 5.1, we have ^ C S (V,Q). Then 



( s - s - L )c s (r,Q) = 



xex 



E 

xex 



p s ( x y- s (x)-p s (xy-* ( x ) 



Summing over x in Eq. (|34p . we obtain the inequality Eq. (|32p . 

Another expression for the bound in terms of the Z-norm is possible for the exponentiated differences. 
Corollary IV. 2. Letp(x), q(x) be as above. Then we have 



sc Cs(V,Q) s? 



e P S (x)q s (x) _ e p s (x)q 3 (x) 



where \\% := [E aeX \t(x)\ l ] 1/1 fori > 0. 
Proof. From the Holder's inequality with 1/a + 1//3 = 1 , it immediately follows that 



\ e P 3 (x)q a (x)_ e p a (x)q 3 

ceX y/ exp[p s (x)q 1 - s (x) + p 3 ' 1 (x)q 1 ~ s ~ 1 (x)] 



E 



xex 



l/a 



E 

xex 



^exp[p s (x)q 1 3 (x)+p 3 1 (x)q 1 s 1 (x)] 



1//3 



(34) 



(35) 
□ 



(36) 



(37) 
□ 



V. SHIFT INFORMATION FOR C S (V, Q) 

The notion of the shift information introduced in Ref . [24| is an interesting means of investigating our new distance 
measure, in that we may ascribe the infinitesimal shift to known quantities. The original definition of the shift 
information can be expressed by using the ordinary KL entropy as IC(p(C),p(C + A)), where the A is a sufficiently 
small quantity compared to the variable As a consequence of the expansion, the Fisher information measure appears 
in the second order term in the case of the usual KL entropy [24[ and also in a generalized KL entropy [f| . We would 
be able to expect that the shift information for the present generalization can also be expressible in terms of the 
Fisher information, where the generalization parameter s should govern the degree of the shift. We look this fact 
below. The shift information is defined as 



/(A, a) :=£ s (p(C),KC + A)) 



1 



dC P S (C)[P(C + A)] 



tf- (CMC + A)] 1 



(38) 
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Expanding [p(£ + A)] 1 7 with respect to A, 



p 7 (C)[p(C + A) 



1-7 



p(C) + (l-7y(C)A + (l- 7 ) \p (0-7 



p(C) 



A 2 
2 



(39) 



where 7 denotes either s or s 1 and the prime implies the derivative with respect to £. Then the shift information 
can be expressed up to the second order in A as 



I(A,s) 



1 



s)p'(QA 



p'(0? 



p(0 



A 2 
2 



(40) 



where we have put a(s) = s(s — 1) + s x (l — s x ). Therefore we find that the Fisher information J d((p') 2 /p is a 
relevant quantity to the second order in the shift A. Moreover, the variation of /(A, s) is given by 



5I(A,s) 



d_ 

dp 



d_d_ 

dCdp 1 



o(s) 



d(5 P 

s)p'A + [(s^ 1 - s)p" 
6Ikl(A), 



d 2 d 
d( 2 dp" 

(pT^ 

V 1 2 



a(s)- 



~ 2(s-s- 1 ) 

where SIkl(A) is the variation calculated for the shift information for the ordinary KL (24 



d(Sp 




P 



(41) 



(42) 



This result indicates that the variation simply differs by a factor from the one obtained for the ordinary KL, whose 
degree is controlled by the index s. If the second derivative of the distance measure indicates a direct quantity and is 
responsible for keeping the discernible interval between the shifted and the original, then the sign of d 2 I(A,s)/dA 2 
would be a signature of this stability. As an example, consider a Gaussian form as a representative distribution which 
appears in many disciplines. The Fisher information is calculated to be er~ 2 for the domain £ € [—00, 00], where a is 
the standard deviation of the Gaussian distribution function. By straightforward calculation, we obtain 



d 2 I(A lS ) 
OA 2 



a(s) 



(43) 



We find that a(s)/(s — s _1 ) ^ when s ^ 0, indicating that the information is stable against the shift A when s > 0. 
We note that this result is consistent with the conclusion derived from the two level perturbation approach obtained 
in section HV Ei where the corresponding C s is stable (unstable) if s > (s < 0). In this sense, the two different 
approaches for investigation of the stability associated with the distinguishability can be regarded as equivalently 
informative. It is worth mentioning that in the case of the Renyi relative entropy we obtain the shift information as, 



I R (A,s) := 



1 



s- 1 
1 



In 



d( P (() 



P(C + A) 

p(0 



1 



111 



1 + (1 - s)A / d(p' 



(1-s) 



A 1 



dC [p 



(p 



>\2 \ 1 



For the Gaussian distribution, the second derivative is calculated as 

d 2 I R {A, s) s(s - 1)(A - l) 2 + 2ct 2 - s(s - 1) 



<9A 2 



[2o- 2 - s(l - s)A 2 



(44) 



(45) 



Therefore, when a 2 > s(s— l)/2 is satisfied, I R (A, s) is found to be stable. The remarkable difference between /(A, s) 
and I R (A,s) is that the stability is controlled only by s for our shift information, whereas the form of distribution 
(i.e., the magnitude of a) imposes the restriction for the domain of s in the case of the Renyi shift information in 
general. 
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VI. BEHAVIOR UNDER LIOUVILLE DYNAMICS 



We shall prove in this section that two states can only become less distinguishable in the course of a dynamical 
evolution when the distance between them are measured by the present one. In other words, the generalized KL 
entropy C s does not increase with time, instead is shown to be constant in time under the Liouville equation 



dp 

at 



V • (vp) = 0, 



(46) 



where p(£, t) denotes a probability density describing a statistical ensemble of dynamical systems and v = dQ/dt stands 
for the drift velocity. The time derivative of the generalized KL entropy for two arbitrary probability distributions 
which satisfy the Liouville equation is 



dt 



1 



s — s 
1 

s — s~ 



4 



l/s l-l/i 
■Pi P2 



d( 



h{pi,Pi) 



dpi 
dt 



+ h(.Pi,P2) 



dp2 

at 



(47) 



where 



h 



P2 



1 (P2 
S \Pl 



h = (1 - *) 



P2 



(1 



S \Pl J \Pl 

Substituting Eq.(|46]) into Eq.j47]), then using fVg = V(fg) - (V/)g, we obtain for the integral of Eq. (|47l 



(48) 



d( 



s(l - s) 



-*i->l = 



ia-i)(a 

s s \pi 



-i(l-i) 

s s \px 



■Pi 



P2 



Pi 



(49) 



where we have assumed that the two probability distributions / and g vanish at the boundary, so that J V(/g)(iC = 
holds. The quantity within the bracket is calculated to be zero, therefore C s is found to be an invariant measure 
under this dynamics for all values of s. It it worth mentioning that relative entropies of the form J dCP\f(T'2/'Pi) 
(the Csiszar f-divergence) , where the function / is convex and satisfies /(l) = 0, becomes constant in time under 
the Liouville type dynamical evolution 0, [l(J El, HE] • The fact that dC s /dt — under the Liouville equation proved 
above is consistent with this observation because C s is a particular instance of the Csiszar f-divergence class. 



VII. CONCLUSIONS 



We have investigated properties of a novel generalized KL divergence in the context of statistical physics and 
information theory. Our approach presents a unified recipe for constructing distance measures for probability 
distributions. In this method, the ordinary KL divergence is obtained by differentiation of the generalized overlap 
with respect to the overlap index a and evaluating it by its unity. Similarly, the previously reported generalization 
of the KL divergence, which is consistent with the nonextensive entropy proposed in physics literature, can be 
regarded as an output of the Jackson derivative for the overlap evaluated by its unity. Along this line, we can define 
a family of distance measures by applying the symmetric Jackson derivative to the generalized overlap. We have 
chosen a = 1 to obtain a specific generalization, which belongs to the Csiszar f-divergence type and have shown some 
fundamental properties of the divergence measure. As far as the distance between two probability distributions are 
concerned, the KL relative entropy has infinite generalizations even with our recipe, depending on the evaluation 
index. The connection to an interpretation of the information gain would provide the corresponding generalized 
information content. We have obtained the ratio of the variation of the shift information to that of the ordinary one 
SI (A, s) / 8Ikl{^)- In closing, we remark on a possible application of the divergence in the light of the minimum 
KL divergence scheme. In [261 ] . this minimization formalism was applied to approximately obtain solutions of the 
general A-dimensional linear Fokker-Planck equations. Following this reasoning, the newly introduced divergence 
could be useful for finding approximate solutions to nonlinear Fokker-Planck equations and the related time evolution 
equations. Developing this approach would require future investigation. 
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